\documentclass[twocolumn]{article}
\usepackage{amsmath,cite}
\usepackage{graphicx}

\usepackage{amssymb}

\usepackage{url}

\usepackage[framed,numbered,autolinebreaks,useliterate]{../common/mcode}

\graphicspath{{figures/}}

\title{SCATNET Architecture Document}

\begin{document}
	
\maketitle

\section{Filters}

\subsection{Filter Structure}

Filters are defined by a signal size \mcode{[N,M]}, a filter type (Morlet, Gabor, spline), and wavelet-specific parameters. For one-dimensional signals, $M = 1$.

Filter parameters are specified in a parameters structure, \mcode{fparam}, containing the following fields:
\begin{itemize}
	\item \mcode{fparam.filter_type}: The wavelet type, such as `morlet\_1d', `morlet\_2d', `spline\_1d', for example.
	\item \mcode{fparam.precision}: The numeric precision of the filters. Either \mcode{'double'} or \mcode{'single'}.
\end{itemize}
In addition, the \mcode{fparam} structure would contain parameters specific to the wavelet type chosen (see below).

Once filter parameters are entered, the \mcode{filter_bank} function is called
to generate the filter bank:
\begin{lstlisting}
	filters = filter_bank([N M], fparam);
\end{lstlisting}
This function will then call the appropriate filter bank function (\mcode{spline_filter_bank_1d}, \mcode{morlet_filter_bank_2d}, etc.) depending on the value of \mcode{fparam.filter_type} and put it in a cell array. 

If one of the parameters is an array (except for \mcode{fparam.filter_type} and \mcode{fparam.precision}, which have to be cell arrays), \mcode{filter_bank} will create multiple filter banks and output them. For example, if \mcode{fparam.filter_type} equals \mcode{\{'gabor\_1d','spline_1d'\}}, \mcode{filter_bank} will output a cell array of two filter banks, one with Gabor wavelets and one with spline wavelets. If parameter fields are of different sizes, the shorter ones are extended by concatenating the last value the required number of times.

The returned structure, \mcode{filters}, contains the filters $\psi$ and $\phi$ that form the filter bank, as well as meta information. Specifically, the fields are:
\begin{itemize}
	\item \mcode{filters.psi}: A set of wavelet filters $\psi_\lambda$ (for definition, see below)
	\item \mcode{filters.phi}: A set of lowpass filter(s) $\phi$ (for definition, see below)
	\item The parameters given in \mcode{fparam} and the signal size $\mcode{[N,M]}$. For example, \mcode{filter.filter_type} gives the type of filters in \mcode{filters.psi} and \mcode{filters.phi}.
\end{itemize}

Each filter set (be it \mcode{filters.phi} or \mcode{filters.psi}), is a structure \mcode{fset} containing the following:
\begin{itemize}
	\item \mcode{fset.filter}: A cell array of the actual filter coefficients. These coefficients are implementation-dependent and can encode the filter spatially, in the Fourier domain, at different resolutions, etc.
	\item \mcode{fset.meta}: Contains meta information on the filters. Specifically, it has two fields: \mcode{fset.meta.j}, which is the scale indices, and \mcode{fset.meta.theta}, which is the angle indices (for two-dimensional filters). Both \mcode{fset.meta.j} and \mcode{fset.meta.theta} are of the same length as \mcode{fset.filter}.
\end{itemize}

The scale and angle indices are non-negative integers. The scale index rises with increasing scale, while the angle index rises with increasing angle (counter-clockwise).

\subsection{Morlet/Gabor filter bank}

In addition to the parameters listed above, the Morlet/Gabor filter bank has the following options:
\begin{itemize}
	\item \mcode{fparam.Q}: The number of wavelets per octave. By default $1$.
	\item \mcode{fparam.J}: The number of wavelet scales.
	\item \mcode{fparam.B}: The reciprocal octave bandwidth of the wavelets. By default \mcode{fparam.Q}.
	\item \mcode{fparam.sigma_psi}: The standard deviation of the mother wavelet in space. By default calculated from \mcode{fparam.B}.
	\item \mcode{fparam.sigma_phi}: The standard deviation of the scaling function in space. By default calculated from \mcode{fparam.B}.
	\item \mcode{fparam.slant}: The slant of the mother wavelet ellipse in frequency.
	\item \mcode{fparam.nb_angle}: The number of wavelet angles.
\end{itemize}
The maximal wavelet bandwidth (in space) is determined by $2^{J/Q}$ times the bandwidth of the mother wavelet, which is proportional to \mcode{sigma\_psi}. If \mcode{sigma\_psi} is smaller than a certain threshold, a number of constant-bandwidth filters are added, linearly spaced, to cover the low frequencies.

Again, we can specify different filter banks by setting \mcode{fparam.Q} and \mcode{fparam.J}, etc. to arrays instead of scalars. This is often useful if the nature of the signal is different at different orders, which is usually the case in audio.

To calculate the appropriate \mcode{J} necessary for a given window size \mcode{T} and filter parameters \mcode{Q}, etc, there is the conversion function \mcode{T_to_J}, which is called using:
\begin{lstlisting}
	fparam.J = T_to_J(T, fparam.Q);
\end{lstlisting}
This will ensure that the maximum wavelet scale, and thus the averaging scale of the lowpass filter, will be approximately \mcode{T}.

\subsection{Spline filter bank}

In addition to the parameters listed above, the spline filter bank has the following options:
\begin{itemize}
	\item \mcode{fparam.J}: The number of wavelet scales.
	\item \mcode{fparam.spline_order}: The order of the splines. Only linear (spline order $1$) and cubic (spline order $3$) are supported.
\end{itemize}
The maximal bandwidth is specified here by $2^J$.

\section{Wavelet and Scattering Transforms}

\subsection{Wavelet Transform}

Given a signal \mcode{x}, a filter bank \mcode{filters}, and an options structure \mcode{options}, the wavelet transforms \mcode{wavelet_1d} and \mcode{wavelet_2d} decompose the former into a lowpass part \mcode{x_phi} and a cell array of its wavelet coefficients \mcode{x_psi}. The 1D version is called in the following manner:
\begin{lstlisting}
	[x_phi, x_psi] = wavelet_1d(x, filters, options)
\end{lstlisting}
The outputs are sampled according to their frequency bandwidth.

This transform can also be applied on a layer of coefficients, representing layers of a scattering transform. These layers are defined as structures with fields:
\begin{itemize}
	\item \mcode{layer.signal}: A cell array of signals.
	\item \mcode{layer.meta}: The meta information on the signals, such as their path, their resolutions, etc.
\end{itemize}
The signals are one-dimensional or two-dimensional arrays while \mcode{meta} contains the fields \mcode{meta.j} and \mcode{meta.theta}, which are two-dimensional
arrays. The first dimension has length corresponding \mcode{layer.signal} while the second dimension as length corresponding to the order of the coefficients.
The path of the $p$th coefficient is thus encoded in \mcode{meta.j(:,p)} and \mcode{meta.theta(:,p)}, respectively.

The corresponding layer transforms are \mcode{wavelet_layer_1d} and \mcode{wavelet_layer_2d}, the former being applied to the layer \mcode{U} by calling
\begin{lstlisting}
	[U_phi, U_psi] = wavelet_layer_1d(U, filters, options);
\end{lstlisting}

In addition, there is a function \mcode{modulus_layer}, which computes the complex modulus of the coefficients in the layer.

Combining the wavelet transform with the modulus operator, we can construct a scattering transform. First, we initialize $U\{1\}$ with the input signal, then 
iterate the following calls:
\begin{lstlisting}
	[S{m}, W] = wavelet_layer_1d(U{m}, filters, options);
	U{m+1} = modulus_layer(W);
\end{lstlisting}
Then \mcode{S\{m+1\}} and \mcode{U\{m+1\}} will contain the $m$th-order scattering and wavelet modulus coefficients, respectively.

In the next section, we see how this can be automated.

\subsection{Scattering Transform}

By alternating wavelet transforms and modulus operators, we obtain the scattering transform. Specifically, the \mcode{scat} function takes a signal, an options 
structure, a structure defining the wavelet cascade, and returns the scattering coefficients (or wavelet modulus coefficients, if desired). The scattering 
coefficients are output as a cell array of layers \mcode{S}.

To create this cascade structure, a factory function, \mcode{wavelet_factory_1d} or \mcode{wavelet_factory_2d} is used, which takes as input the signal size, a
 filter parameter structure, a scattering options structure, and the order of the transform. The scattering transform could be used like the following

\begin{lstlisting}
	fparam.filter_type = {'gabor_1d', 'morlet_1d'};
	fparam.Q = [8 1];
	fparam.J = T_to_J(8192,fparam.Q);
	
	options = struct();
	
	wavelet = wavelet_factory_1d(N, fparam, options, 2);
	
	S = scat(x, wavelet);
\end{lstlisting}

The above code will compute a filter bank for signals of length $N$, with the first filter bank consisting of $8$ filters per octave. The second filter bank 
will only have one wavelet per octave, with the corresponding bandwidth, and $13$ octaves of wavelets. The averaging scale of both filter banks is set to $T = 8192$.
These filters are then used to specify the wavelet transforms that are stored in the cascade structure.

To obtain the wavelet modulus coefficients \mcode{U}, \mcode{scat} is called for two outputs, as in
\begin{lstlisting}
[S, U] = scat(x, wavelet);
\end{lstlisting}
The structure of \mcode{U} follows that of \mcode{S}. That is it consists of a cell array representing each layer of wavelet modulus coefficients. The first layer 
corresponds to the zeroth-order wavelet modulus coefficients, which is the original signal. The second layer contains the first-order wavelet modulus coefficients, 
which are the signal convolved with the filters with the modulus applied, and so on. The coefficients in \mcode{S\{m+1\}} are thus the $m$th-order scattering 
coefficients obtained from smoothing the coefficients in \mcode{U\{m+1\}}.

\section{Manipulating, Displaying, Formatting}

\subsection{Renormalization and Logarithm}
Often, second- and higher-order coefficients will reproduce information from their parent coefficients. That is, they will be highly correlated. To reduce this, 
the \mcode{renorm\_scat} function renormalizes each coefficient by dividing it by its parent coefficient. Similarly, the \mcode{log\_scat} function calculates 
the logarithm of each coefficient.

These functions both act on the output of \mcode{scat}, and so can be called on the scattering transform \mcode{S} like:
\begin{lstlisting}
	S_renorm = renorm_scat(S);
	S_renorm_log = log_scat(S_renorm);
\end{lstlisting}

\subsection{Display (1D)}
Two functions are available to display scattering coefficients, \mcode{display_slice} and \mcode{display_multifractal}. They both take as input a scattering transform \mcode{S} and a time point \mcode{t}.

\subsection{Display (2D)}
To display the scattering of 2d functions, 
\begin{lstlisting}
image_scat(sx);
\end{lstlisting}
opens one figure per layer. In each figure, images corresponding to each node of the layer are concatenated.

To display only coefficients of a certain order \mcode{m}, one can use \mcode{image_scat_layer}
\begin{lstlisting}
m = 2;
image_scat_layer(sx{m+1});
\end{lstlisting}

To arrange the coefficients in a specified order (lexicographic order on $j$ for rows and lexicographic order on $\theta$ for columns), one can also use \mcode{image_scat_layer_order}
\begin{lstlisting}
var_y{1}.name = 'j';
var_y{1}.index = 1;
var_y{2}.name = 'j';
var_y{2}.index = 2;
var_x{1}.name = 'theta';
var_x{1}.index = 1;
var_x{2}.name = 'theta';
var_x{2}.index = 2;
big_img = image_scat_layer_order(S{3},var_x, var_y,1);
\end{lstlisting}

\subsection{Formatting}
In order to use the scattering coefficients for classification, they need to
be in a vector format. This is obtained using the function
\mcode{format_scat}, which arranges scattering coefficients into a
two-dimensional table, the first dimension corresponding to a scattering
coefficient index and the second dimension corresponding to time/space. It
also returns a meta structure that specifies the order, scale, etc. of each
scattering coefficient. The following example illustrates its usage:

\begin{lstlisting}
	[sc_table, meta] = format_scat(S);
	
	% plot the 2nd-order coefficients
	% corresponding to scale (3, 7)
	ind = find(meta.order==2 & meta.j(1,:)==3 & meta.j(2,:)==7);
	plot(sc_table(:,ind));
	
	% calculate the energy of 1st-order
	% coefficients
	ind = find(meta.order==1);
	E2 = norm(sc_table(:,ind),'fro')^2;
\end{lstlisting}

Note that if two or more filter banks are used when calculating \mcode{S}, formatting is only possible if the lowpass filters $\phi$ have the same bandwidth, since otherwise scattering coefficients of different orders will have different resolutions and so cannot be fitted into the same matrix without resampling. Using the \mcode{T_to_J} function will ensure that this is always the case.

\section{Classification}

\subsection{Batch Computation}

To compute scattering coefficients for a database of signals, we first define a source. A source is a set of files, each file containing a number of objects to be classified. Each object has a class associated with it and a location within the file.

The source can be created using \mcode{create_src(directory, extract_objects_fun)} (or one of its wrappers, such as \mcode{gtzan_src}). Given a directory, this function recursively traverses it, looking for `.jpg', `.wav', or `.au' files. For each such file, it calls \mcode{extract_objects_fun} to determine its constituent objects and their classes. To add a new database, it suffices to write the corresponding \mcode{extract_objects_fun} function.

The \mcode{extract_objects_fun} function has the following signature
\begin{lstlisting}
[objects, classes] = extract_objects_fun(file);
\end{lstlisting}
Given a file, it returns a structure array \mcode{objects} of the objects it contains and a cell array of their class names \mcode{classes}. 

Most often, a file will only contain one object, so \mcode{objects} is a  structure of size one, but this can vary from database to database (for example in speech recognition a file corresponds to a recording of a phrase, which can contain different phones to be classified).

The objects structure has the fields \mcode{objects.u1} and \mcode{objects.u2} which correspond to the lower-left and upper-right corners (start- and endpoints) of the object in a 2D (or 1D) signal. These bounds are inclusive. For the case where a file only contains one object, the bounds will simply be the bounds of the signal.

Given a directory and this function, \mcode{create_src} returns a source structure \mcode{src}, which has the following fields
\begin{itemize}
	\item \mcode{src.classes}: A cell array of the class names.
	\item \mcode{src.files}: A cell array of the file names.
	\item \mcode{src.objects}: A struct array of the objects returned by applying \mcode{objects_fun} to each filename, but with two supplementary fields: \mcode{src.objects.ind}, corresponding to the index of its parent filename in \mcode{src.files} and \mcode{src.objects.class}, corresponding to the index of its class in \mcode{src.classes}.
\end{itemize}

The scattering coefficients of the objects in \mcode{src} can now be calculated using the function \mcode{prepare_database}. This function takes for input the \mcode{src} structure, a cell array of function handles, \mcode{feature_fun}, which contain the functions calculating the feature representations, and an options structure.

A ``feature function'' takes as an input the file data and a structure array of its constituent objects and returns the corresponding feature vectors as a 2D matrix. Its signature is thus
\begin{lstlisting}
features = feature_fun(signal, objects);
\end{lstlisting}
Since this flexibility is often not necessary, a simplified feature function is also allowed, which only takes inputs \mcode{obj_signal} in the form of a 2D matrix, each column corresponding to a signal to be transformed, and returns the associated feature vectors \mcode{obj_features}. It has the following signature
\begin{lstlisting}
obj_features = feature_fun(obj_signal);
\end{lstlisting}
In this case, the function \mcode{feature_wrapper} is called, which extracts the objects from the file data \mcode{signal}, applies the feature function to each object, and collects the output features.

Note that a feature function is not required to output only one feature vector for each object. If more than one feature vector is associated to a single object, the classifier will classify each feature vector separately and aggregate the results, either through averaging the approximation error (for the affine space classifier) or through voting (for the SVM classifier).

The following code sets up a feature function array for the normalized log-scattering transform:
\begin{lstlisting}
fparam.filter_type = {'gabor_1d', 'morlet_1d'};
fparam.Q = [8 1];
fparam.J = T_to_J(8192,fparam.Q);

options = struct();

cascade = wavelet_factory_1d(N, fparam, options, 2);

feature_fun = {@(x)(format_scat( ...
	log_scat(renorm_scat(scat(x, cascade)))))};
\end{lstlisting}

The feature vectors are then computed by calling \mcode{prepare_database}, as in:
\begin{lstlisting}
database = prepare_database(src, feature_fun);
\end{lstlisting}
The resulting \mcode{database} structure then contains three fields:
\begin{itemize}
	\item \mcode{database.src}: The original \mcode{src} structure from which the features were computed.
	\item \mcode{database.features}: The feature vectors, arranged in a 2D matrix, each feature vector forming a column.
	\item \mcode{database.indices}: The indices corresponding to each object in \mcode{database.src}. Specifically, \mcode{database.features(:,database.indices\{k\})} contains the features vector(s) calculated from \mcode{database.src.objects(k)}.
\end{itemize}

\subsection{Training/Testing}

Once the database of feature vectors is constructed, models can be trained and tested. To create a train/test partition, the function \mcode{create_partition} is available. Given a source \mcode{src} and a ratio, it partitions the objects so that each class is divided evenly into the training and testing set according to the ratio. For example, the following code defines a train/test partition with $80\%$ of the objects in the training set:
\begin{lstlisting}
[train_set, test_set] = create_partition(src, 0.8);
\end{lstlisting}
Here, \mcode{train_set} will contain the indices in \mcode{src.objects} that correspond to the training instances, while \mcode{test_set} will contain those corresponding to testing instances.

Alternatively, a partition can be defined manually by traversing the \mcode{src.objects} array and recording the desired indices.

Given a partition, a model can be trained using the appropriate training function. We will denote the training function by \mcode{train}, but in reality it will be either \mcode{affine_train} or \mcode{svm_train}. The training is then done by:
\begin{lstlisting}
model = train(database, train_set, train_opt);
\end{lstlisting}
The structure \mcode{train_opt} contains various parameters for training the model, which will depend on the type of classifier used.

To test the model, the functions \mcode{affine_test} and \mcode{svm_test} are used. Here, we denote the testing function by \mcode{test}. The labels obtained for the testing instances specified by \mcode{test_set} are then obtained by:
\begin{lstlisting}
labels = test(database, model, test_set);
\end{lstlisting}
To calculate the classification error, the \mcode{classif_err} is used:
\begin{lstlisting}
err = classif_err(labels, test_set, src);
\end{lstlisting}
Note that the original \mcode{src} structure is necessary here to verify the class membership of the testing instances.

\subsection{Affine Classifier}

The affine classifier is defined by the functions \mcode{affine_train} and \mcode{affine_test}. The former takes as an option the number of dimensions, \mcode{train_opt.dim} used for the affine space model of each class. 

Multiple dimensions can be specified in order to test the difference in performance for different dimensions. In this case, the labels returned by \mcode{affine_test} form a 2D matrix, with the first dimension corresponding to the dimension and the second dimension corresponding to the testing instance index.

The following code calculates the minimum error for an affine space classifier of dimension between $0$ and $160$:
\begin{lstlisting}
train_opt.dim = 0:160;
model = affine_train(database, train_set, train_opt);
labels = affine_test(database, model, test_set);
err = classif_err(labels, test_set, src);
[min_err, dim_ind] = min(err);
min_dim = train_opt.dim(dim_ind);
\end{lstlisting}

For an affine space classifier, the model structure contains the dimensions for which it is defined \mcode{model.dim}, the centers of the classes \mcode{model.mu} and the direction vectors defining the affine space \mcode{model.v}.

\subsection{Support Vector Machine}

\textbf{NOTE: Requires the LIBSVM library, see \url{http://www.csie.ntu.edu.tw/~cjlin/libsvm/}}

The support vector machine classifier is defined by the functions \mcode{svm_train} and \mcode{svm_test}. The training options consist of:
\begin{itemize}
	\item \mcode{train_opt.kernel_type}: The kernel type used in the SVM. Can be \mcode{'linear'} for a linear kernel $u^Tv$ or \mcode{'gaussian'} for a Gaussian kernel $e^{-\gamma\|u-v\|^2}$.
	\item \mcode{train_opt.C}: The slack factor $C$ used for training the SVM.
	\item \mcode{train_opt.gamma}: The regularity constant $\gamma$ used in the case of a Gaussian SVM kernel.
\end{itemize}

The following code calculates the error for an SVM classifier with a linear kernel and a slack factor $C = 8$:
\begin{lstlisting}
train_opt.kernel_type = 'linear';
train_opt.C = 8;
model = svm_train(database, train_set, train_opt);
labels = svm_test(database, model, test_set);
err = classif_err(labels, test_set, src);
\end{lstlisting}

If a linear kernel is used, we can extract the discriminant vector $w$ and bias $\rho$ from the model using the function \mcode{svm_extract_w}. The function takes as input the database \mcode{database} and the SVM model \mcode{model}. It outputs the $w$s corresponding to each pair of classes in the model, arranged in the order $1$vs$2$,$1$vs$3$,\ldots$1$vs$N$,$2$vs$3$,\ldots,$2$vs$N$,\ldots,$(N-1)$vs$N$, where $N$ is the number of classes. The function is called as in:
\begin{lstlisting}
[w,rho] = svm_extract_w(database, model);
\end{lstlisting}

\end{document}
